Lesson 5. Data-Driven Mapping

Data-driven mapping refers to the process of using data values to determine the symbology of mapped features. Color, shape, and size are the three most common graphic elements used to symbolize data-driven maps. Data-driven maps are often referred to as thematic maps. In this lesson we take a deep dive into data driven mapping!


Instructor Notes

Types of Thematic Maps

There are two primary types of thematic maps:

  • Choropleth maps, which set the color of areas (polygons) by data value

  • Point symbol maps, which set the color or size of points by data value

Many of the techniques for creating these maps can also be used with line data, although it is less common.

We review both of these types of maps in more detail in this lesson. First, let’s load the R libraries we will use.

library(sf)
library(tmap)
library(here)

5.1 Choropleth Maps

Choropleth maps are the most common type of thematic map.

Let’s use an sf data.frame of California counties to make a choropleth map.

First, read in the counties data with the st_read function.

counties <- st_read(here("notebook_data",
                         "california_counties",
                         "CaliforniaCounties.shp"))
## Reading layer `CaliforniaCounties' from data source 
##   `/Users/pattyf/Documents/Dlab/workshops/AY2022/R-Geospatial-Fundamentals/notebook_data/california_counties/CaliforniaCounties.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022
## Projected CRS: NAD83 / California Albers

Then, make a basic map of our county boundaries.

plot(counties$geometry)

Now, take a look at the spatial dataframe.

head(counties)
## Simple feature collection with 6 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## Projected CRS: NAD83 / California Albers
##          NAME STATE_NAME POP2010 POP10_SQMI POP2012  POP12_SQMI   WHITE  BLACK
## 1        Kern California  839631      102.9  851089  104.282870  499766  48921
## 2       Kings California  152982      109.9  155039  111.427421   83027  11014
## 3        Lake California   64665       48.6   65253   49.082334   52033   1232
## 4      Lassen California   34895        7.4   35039    7.422856   25532   2834
## 5 Los Angeles California 9818605     2402.3 9904341 2423.264150 4936599 856874
## 6      Madera California  150865       70.1  153025   71.065672   94456   5629
##   AMERI_ES   ASIAN HAWN_PI HISPANIC   OTHER MULT_RACE   MALES FEMALES MED_AGE
## 1    12676   34846    1252   413033  204314     37856  433108  406523    30.7
## 2     2562    5620     271    77866   42996      7492   86344   66638    31.1
## 3     2049     724     108    11088    5455      3064   32469   32196    45.0
## 4     1234     356     165     6117    3562      1212   22416   12479    37.0
## 5    72828 1346865   26094  4687889 2140632    438713 4839654 4978951    34.8
## 6     4136    2802     162    80992   37380      6300   72682   78183    33.1
##   AVE_HH_SZ AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC CountyFIPS
## 1      3.15       3.61    284367  29757    152828     101782      06103
## 2      3.19       3.59     43867   2634     22329      18904      06089
## 3      2.39       2.94     35492   8944     17472       9076      06106
## 4      2.50       2.98     12710   2652      6590       3468      06086
## 5      2.98       3.58   3445076 203872   1544749    1696455      06073
## 6      3.28       3.63     49140   5823     27726      15591      06102
##                         geometry
## 1 MULTIPOLYGON (((213672.6 -2...
## 2 MULTIPOLYGON (((12524.03 -1...
## 3 MULTIPOLYGON (((-235734.3 1...
## 4 MULTIPOLYGON (((12.28914 35...
## 5 MULTIPOLYGON (((173874.5 -4...
## 6 MULTIPOLYGON (((16681.16 -1...

In particular, we are interested in the columns with numeric values as these are the ones typically used to make data maps.

To get started, let’s create a choropleth map by setting the color of each county based on the value in the population per square mile column (POP12_SQMI).

Recall that sf’s plot method does this by default! So, here’s the quickest way to make a choropleth:

plot(counties['POP12_SQMI'])

By default, sf::plot linearly scales the colors to the data values. This is called a proportional color map.

Choropleth mapping with tmap

We can also use tmap to create thematic maps. This package gives us greater control over the visualization details so we can better explore the distribution of data values.

In tmap, instead of setting the col argument to the same static value for all features (e.g. ‘red’, ‘#ef03a5’), we can set it to the name of the column by which we want our polygons colored (e.g. ‘POP12_SQMI’).

# Set the mapping mode to a static plot (not interactive)
tmap_mode('plot')  
## tmap mode set to plotting
# Map the county polygons colored by the values in the POP12_SQMI column
tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              title = "Population Density per mi^2")

By default, tmap uses a yellow-orange-brown (YlOrBr) sequential color palette for thematic maps and bins those colors into 3 to 7 classes of approximately equal intervals with rounded values for class breaks.

Of course, we can also use tmap’s interactive mapping mode. Do you recall the syntax for:

  • setting the tmap mode to static vs interactive mapping?

  • or toggling between these two modes?

Let’s make an interactive map, making our layer partially transparent, i.e. alpha = 0.4, so that we can see the basemap through our polygons.

  • This transparency may be more or less noticeable depending on the selected basemap!
tmap_mode('view')
## tmap mode set to interactive viewing
tmap_options(check.and.fix = TRUE)  # force tmap to display invalid polygons

tm_shape(counties) +
  tm_polygons(col='POP12_SQMI', alpha=0.5,
              title = "Population Density per mi^2")
## Warning: The shape counties is invalid (after reprojection). See sf::st_is_valid

That’s really the heart of of creating a choropleth map with tmap. To set the color of the features based on the values in a column, set the col argument to the column name in the sf data.frame (cast as a string!).

Before we move on, let’s use the st_make_valid function to fix the county geometry!

counties <- st_make_valid(counties)

Practice

Redo the map above, but mapping population (POP2012) NOT population density.

# Map of County Population (POP2012)

Question

What map better conveys CA county population - POP12_SQMI or POP2012?

The Challenge of Thematic Maps

The goal of a thematic map is to use color to visualize the spatial distribution of a variable in order to identify trends and outliers.

Another goal is to use color to effectively and quickly convey information. For example,

  • maps use brighter or richer colors to signify higher values,

  • and leverage cognitive associations such as mapping water with the color blue.

There are two major challenges when creating thematic maps:

  1. Our eyes are drawn to the color of larger areas or linear features, even if the values of smaller features are more significant.

  2. The range of data values is rarely evenly distributed across all observations and thus the colors can be misleading.

Questions

  • Do you see either of these problems in our population-density map?

    • Take a look at the histogram below as you consider the above question.
hist(counties$POP12_SQMI,
     breaks = 40, 
     main = 'Population Density per mi^2')

There are three main techniques for dealing with these mapping challenges:

  1. Color palettes

  2. Data transformations

  3. Classification schemes

5.2 Color Palettes

There are three main types of color palettes (or color maps), each of which has a different purpose:

  • diverging - a “diverging” set of colors are used so emphasize mid-range values as well as extremes.

  • sequential - usually with a single or multi color hue to emphasize differences in order and magnitude, where darker colors typically mean higher values

  • qualitative - a contrasting set of colors to identify distinct categories and avoid implying quantitative significance.

Tip: Sites like ColorBrewer let’s you play around with different types of color maps.

To see the names of all color palettes avaialble to tmap, try the following command. You may need to enlarge the output image.

RColorBrewer::display.brewer.all()

As a best practice, a qualitative color palette should not be used with quantitative data and vice versa. For example, consider this map that EDM.com published of top dance tracks by state.

5.3 Transforming Count Data

For a number of reasons, data are often distributed in aggregated form. For example, the Census Bureau collects data from individual people, households and businesses and distributes it aggregated to states, counties, and census tracts, etc.

When the aggregated data are counts, like total population, they can be transformed to densities, proportions and ratios. These normalized variables are more comparable across regions that differ greatly in size.

Let’s consider this in terms of our data.

  • Counts
    • data counts, aggregated by feature
      • e.g. population within a county
  • Densities
    • counts aggregated by feature and normalized by feature area
      • e.g. population per square mile within a county
  • Proportions / Percentages
    • value in a specific category divided by total value across in all categories
      • e.g. proportion of the county population that is non-white
  • Rates / Ratios
    • value in one category divided by value in another category, e.g.:
      • COVID-19 cases per 100,000 persons
      • COVID-19 R Factor: number of people likely to be infected by one person with COVID-19

The basic cartographic rule is that when mapping data for areas that differ in size you rarely map counts since those differences in size make the comparison less valid or informative.

5.4 Classification schemes

Another way to make more meaningful maps is to improve the way in which data values are associated with colors.

The common alternative to a proportional color map is to use a classification scheme to create a graduated color map. This is the standard way to create a choropleth map.

A classification scheme is a method for binning continuous data values into 4-7 classes (the default is 5) and then associate those classes with the different colors in a color palette.

The commonly used classifications schemes:

  • Equal intervals or Pretty
    • equal-size data ranges (e.g., values within 0-10, 10-20, 20-30, etc.)
    • pros:
      • best for data spread across entire range of values
      • easily understood by map readers
    • cons:
      • avoid if you have highly skewed data or a few big outliers because one or more of the bins may have no data observations
  • Quantiles
    • equal number of observations in each bin
    • pros:
      • looks nice, becuase it best spreads colors across full set of data values
      • thus, it’s often the default scheme for mapping software
    • cons:
      • bin ranges based on the number of observations, not on the data values
      • thus, different classes can have very similar or very different values.
  • Natural breaks
    • minimize within-class variance and maximize between-class differences
    • e.g. ‘fisher-jenks’,
    • pros:
      • great for exploratory data analysis, because it can identify natural groupings
    • cons:
      • class breaks are best fit to one dataset, so the same bins can’t always be used for multiple years
  • Head/Tails
    • a new relatively new scheme for data with a heavy-tailed distribution
  • Manual
    • classifications are user-defined
    • pros:
      • especially useful if you want to slightly change the breaks produced by another scheme
      • can be used as a fixed set of breaks to compare data over time
    • cons:
      • more work involved

Classification schemes and tmap

Classification schemes can be implemented using the tmap geometry functions (tm_polygons, tm_dots, etc.) by setting a value for the style argument.

Here are some of the tmap keyword names for the different classification styles (see the documentation: ?tm_polygons):

  • equal, quantile,fisher, jenks, headtails, fixed, kmeans, pretty.

For more information about classification schemes see ?classIntervals or sources such as this page in the Geocomputation with R ebook.


Classification schemes in action

Let’s redo the previous map using the quantile classification scheme.

  • What is different about the code? About the output map?
tmap_mode('plot')
## tmap mode set to plotting
# Plot population density - mile^2
tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              style = "quantile",
              alpha = 0.5,
              title = "Population Density per mi^2")

Practice

Redo the previous map with these classification schemes: headtails, equal, jenks

  • Which one do you like best?

User Defined Classification Schemes

You may get pretty close to your final map without being completely satisfied. In this case you can manually define a classification scheme.

Let’s customize our map with a user-defined classification scheme where we manually set the breaks for the bins using the classification_kwds argument.

tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              palette = "YlGn", 
              style = 'fixed',
              breaks = c(0, 50, 100, 200, 300, 400, max(counties$POP12_SQMI)),
              title = "Population Density per Square Mile")

Since we are customizing our plot, we can also edit our legend to specify the text, so that it’s easier to read.

  • We’ll use tm_add_legend to build our own customized legend.
tm_shape(counties) + 
  tm_polygons(col = 'POP12_SQMI',
              palette = "YlGn", 
              style='fixed',
              breaks = c(0, 50, 100, 200, 300, 400, max(counties$POP12_SQMI)),
              legend.show = F) +
tm_add_legend('fill', col = RColorBrewer::brewer.pal(6, "YlGn"),
              border.col = "black",
              title = "Population Density per Sq Mile",
              labels = c('<50','50 to 100','100 to 200','200 to 300','300 to 400','>400'))

Let’s create a new variable to map

If we look at the columns in our dataset, we see we have a number of variables from which we can calculate proportions, rates, and the like.

Let’s try that out:

head(counties)
## Simple feature collection with 6 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6
## Projected CRS: NAD83 / California Albers
##          NAME STATE_NAME POP2010 POP10_SQMI POP2012  POP12_SQMI   WHITE  BLACK
## 1        Kern California  839631      102.9  851089  104.282870  499766  48921
## 2       Kings California  152982      109.9  155039  111.427421   83027  11014
## 3        Lake California   64665       48.6   65253   49.082334   52033   1232
## 4      Lassen California   34895        7.4   35039    7.422856   25532   2834
## 5 Los Angeles California 9818605     2402.3 9904341 2423.264150 4936599 856874
## 6      Madera California  150865       70.1  153025   71.065672   94456   5629
##   AMERI_ES   ASIAN HAWN_PI HISPANIC   OTHER MULT_RACE   MALES FEMALES MED_AGE
## 1    12676   34846    1252   413033  204314     37856  433108  406523    30.7
## 2     2562    5620     271    77866   42996      7492   86344   66638    31.1
## 3     2049     724     108    11088    5455      3064   32469   32196    45.0
## 4     1234     356     165     6117    3562      1212   22416   12479    37.0
## 5    72828 1346865   26094  4687889 2140632    438713 4839654 4978951    34.8
## 6     4136    2802     162    80992   37380      6300   72682   78183    33.1
##   AVE_HH_SZ AVE_FAM_SZ HSE_UNITS VACANT OWNER_OCC RENTER_OCC CountyFIPS
## 1      3.15       3.61    284367  29757    152828     101782      06103
## 2      3.19       3.59     43867   2634     22329      18904      06089
## 3      2.39       2.94     35492   8944     17472       9076      06106
## 4      2.50       2.98     12710   2652      6590       3468      06086
## 5      2.98       3.58   3445076 203872   1544749    1696455      06073
## 6      3.28       3.63     49140   5823     27726      15591      06102
##                         geometry
## 1 MULTIPOLYGON (((213672.6 -2...
## 2 MULTIPOLYGON (((12524.03 -1...
## 3 MULTIPOLYGON (((-235734.3 1...
## 4 MULTIPOLYGON (((12.28914 35...
## 5 MULTIPOLYGON (((173874.5 -4...
## 6 MULTIPOLYGON (((16681.16 -1...

Let’s calculate the percent of the population that is hispanic and save it to a new column. Then, we can use that to create a choropleth map.

# calculate percent hispanic as a new column
counties$pct_hispanic = counties$HISPANIC/counties$POP2012 * 100

# Plot percent hispanic as choropleth
tm_shape(counties) + 
  tm_polygons(col = 'pct_hispanic',
              palette = 'Blues', 
              style = 'fixed',
              breaks = c(0, 20, 40, 60, 80, 100),
              border.col = "darkgrey",
              lwd = 1.5,
              legend.show = F) + 
tm_add_legend('fill', col = RColorBrewer::brewer.pal(5, "Blues"),
              border.col = "darkgrey",
              title = "Percent Hispanic Population",
              labels = c('<20%',
                         '20% - 40%',
                         '40% - 60%',
                         '60% - 80%',
                         '80% - 100%'))

Questions

  1. What new options and operations have we added to our code?

  2. How many values do we specify in the breaks vector, and how many bins are in the map legend? Why?

5.5 Point Maps

Choropleth maps are great, but point maps enable us to visualize our spatial data in another way.

If you know both mapping methods you can expand how much information you can show in one map.

For example, point maps are a better way to map counts because the varying sizes of areas are deemphasized.

  • We can use the sf::st_centroid function to dynamically transform the county polygons to their centroids (point centers).

  • We then use the tm_dot elementto create point maps dynamically from polygon data! Let’s take a look.

# County population counts as a point map!
tmap_mode('plot')
## tmap mode set to plotting
# Add the county polygon borders as a basemap
tm_shape(counties) + 
  tm_borders(col = "grey") +
  
# Then map the county centroids as points colored by population counts
  tm_shape(st_centroid(counties)) + 
  tm_dots(col = 'POP2012',
              palette = 'YlOrRd', 
              style = 'jenks',
              border.col = "black",  # dot borders only visible in interactive mode!
              border.lwd = 1,
              border.alpha = 1,
              size = .5,
              legend.show = T) 
## Warning in st_centroid.sf(counties): st_centroid assumes attributes are constant
## over geometries of x

This is another useful type of data transformation for making effective maps.

More Point Data Maps

Let’s read in some data that is more typically encoded with point geometry - Alameda County schools.

schools_df <- read.csv(here("notebook_data",
                            "alco_schools.csv"))

head(schools_df)
##           X        Y                      Site               Address    City
## 1 -122.2388 37.74476 Amelia Earhart Elementary 400 Packet Landing Rd Alameda
## 2 -122.2519 37.73900       Bay Farm Elementary   200 Aughinbaugh Way Alameda
## 3 -122.2589 37.76206  Donald D. Lum Elementary    1801 Sandcreek Way Alameda
## 4 -122.2348 37.76525         Edison Elementary  2700 Buena Vista Ave Alameda
## 5 -122.2381 37.75396     Frank Otis Elementary      3010 Fillmore St Alameda
## 6 -122.2616 37.76911       Franklin Elementary  1433 San Antonio Ave Alameda
##   State Type API    Org
## 1    CA   ES 933 Public
## 2    CA   ES 932 Public
## 3    CA   ES 853 Public
## 4    CA   ES 927 Public
## 5    CA   ES 894 Public
## 6    CA   ES 893 Public

We got it from a plain CSV file, let’s promote it to an sf data.frame.

schools_sf <- st_as_sf(schools_df, 
                       coords = c('X','Y'),
                       crs = 4326)

Then we can map it.

plot(schools_sf)

What is useful about the above display of the maps for each column in the dataframe is that at a glance you can see the type of data variable and get a sense of the range of values.

The default sf::plot point map for a numeric data column is a proportional color map that linearly scales the color of the point symbol by the data values.

# Point map of API - Academic Performance Index
plot(schools_sf['API'])

Point maps with tmap

Let’s try creating the same map with tmap.

tmap_mode('plot')
## tmap mode set to plotting
tm_shape(schools_sf) + 
  tm_dots(col = "API")

The basic tmap graduated color map needs some customization to shine, especially in plot mode!

Adding context

We can add the outline of Alameda County to improve our map.

tmap_mode('plot')
## tmap mode set to plotting
# Add Alameda County outline
tm_shape(counties[counties$NAME=='Alameda',]) +
  tm_borders() +

# Add Alameda County Schools
tm_shape(schools_sf) + 
  tm_dots(col = "API") +

# Position the legend in the bottom left corner
tm_legend(position=c("left", "bottom"))

Customizing bins and colors

By default, tmap uses a yellow-orange-brown (YlOrBr) sequential color palette and the pretty classification scheme for point thematic maps. These are the same defaults that are used for tmap choropleth maps. Note, point maps that symbolize data values by color are called Graduated Color Maps. In spite of the different map names, the color and classification scheme options are almost identical in tmap! However, some options will be different - for example, a size parameter makes sense for a point radius but not a polygon!

See ?tm_dot for more information about the options for customizing point maps! For example…

# API Graduated Color Map

# Add the county polygon
tm_shape(counties[counties$NAME=='Alameda',]) +
  tm_polygons(col="lightgrey") +

# Add the Schools
tm_shape(schools_sf[order(schools_sf$API),]) + 
  tm_dots(col = 'API', 
          size = 0.15,
          palette = 'Reds', 
          style = 'fixed',
          breaks = c(0, 200, 400, 600, 800, 1000),
          border.col = 'grey',
          legend.show = F) + 
  
  # Customize the legend
  tm_add_legend('fill', 
                title = 'Alameda County, school API scores',
                labels = c('<200', 
                           '[200,400)', 
                           '[400,600)', 
                           '[600,800)', 
                           '>800'),
                col = RColorBrewer::brewer.pal(5, "Reds")) +
  
  # position the legend
  tm_layout(legend.position = c('right', 'top'))

Proportional Symbol Maps

Another important type of point map is the proportional symbol map. These are like proportional color maps but instead of associating symbol color with data values they associate symbol size. You can make these in tmap with the tm_bubbles function.

The schools data does not contain any good variables for proportional symbol mapping so we will read in a supplemental file of National Center for Educational Statistics (NCES) data and join it to the school points.

df <-  read.csv(here("notebook_data",
                   "other",
                   "PolicyMap_NCES_Data_20210429.csv"))

# take a look
head(df, 2)
##                              School.Name
## 1 California School for the Deaf-Fremont
## 2                      Dublin Elementary
##                                              Education.Agency
## 1 California School for the Deaf-Fremont (State Special Schl)
## 2                                              Dublin Unified
##       School.Location    City State   Zip           Type.of.School
## 1 39350 Gallaudet Dr. Fremont    CA 94538 Special Education School
## 2      7997 Vomac Rd.  Dublin    CA 94568           Regular School
##   School.status.at.time.of.last.report School.Level         County
## 1                                 Open        Other Alameda County
## 2                                 Open      Primary Alameda County
##                                        Grades.Offered Total.Students
## 1 Kindergarten, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12            375
## 2                         Kindergarten, 1, 2, 3, 4, 5            894
##   Full.time.equivalent.Classroom.Teachers Student.Teacher.Ratio
## 1                                   73.91                  5.07
## 2                                   33.60                 26.61
##   National.School.Lunch.Program.Status
## 1                                   No
## 2                                   No
##   Free.and.Reduced.price.Lunch.Eligible.Students        Title.I.Eligible
## 1                                            372                 Unknown
## 2                                            111 Title I Eligible School
##       School.wide.Title.I                     Title.I.Status
## 1                                                           
## 2 Not School-wide Title I Title I targeted assistance school
##                                           GreatSchools.Link
## 1 http://www.greatschools.net/modperl/browse_school/CA/2297
## 2   http://www.greatschools.net/modperl/browse_school/CA/55
##   NCES.Public.School.ID Urban.centric.Locale Magnet.School Charter.School
## 1           60000310347      Suburban, Large                             
## 2           60001906929      Suburban, Large                             
##   Shared.time.School Virtual.School Reconstituted Total.Students.1
## 1                 NA                           NA              375
## 2                 NA                           NA              894
##   Prekindergarten.Students Kindergarten.Students Grade.1.Students
## 1                       NA                    36               26
## 2                       NA                   350              300
##   Grade.2.Students Grade.3.Students Grade.4.Students Grade.5.Students
## 1               24               22               40               32
## 2              296              294              290              258
##   Grade.6.Students Grade.7.Students Grade.8.Students Grade.9.Students
## 1               60               48               52               84
## 2               NA               NA               NA               NA
##   Grade.10.Students Grade.11.Students Grade.12.Students
## 1                82                92               152
## 2                NA                NA                NA
##   High.School.Students.Earning.College.Credit.or.CTE.Students.Beyo
## 1                                                               NA
## 2                                                               NA
##   Adult.Education.Students Ungraded.Students White..Non.Hispanic.Students
## 1                        0                NA                          174
## 2                        0                NA                          582
##   Hispanic.Students Black..Non.Hispanic.Students Asian..Non.Hispanic.Students
## 1               382                           58                           88
## 2               296                           50                          658
##   American.Indian.Alaska.Native.Students
## 1                                     NA
## 2                                     12
##   Hawaiian.Native.Pacific.Islander.Students Two.or.More.Races.Students
## 1                                         4                         44
## 2                                        20                        170
##   Percent.White..Non.Hispanic Percent.Hispanic Percent.Black..Non.Hispanic
## 1                       23.20            50.93                        7.73
## 2                       32.55            16.55                        2.80
##   Percent.Asian..Non.Hispanic Percent.American.Indian.Alaska.Native
## 1                       11.73                                    NA
## 2                       36.80                                  0.67
##   Percent.Hawaiian.Native.Pacific.Islander Percent.Two.or.More.Races
## 1                                     0.53                      5.87
## 2                                     1.12                      9.51
##   All.Female.High.School Majority.Minority.High.School Inner.City.High.School
## 1                                                                            
## 2

Subset to keep only a few columns

df2 <- df[c('School.Name',
           'Student.Teacher.Ratio',
           'Free.and.Reduced.price.Lunch.Eligible.Students')]

# Rename the columns
colnames(df2) <- c('Site',
                   'STRatio',
                   'RLunch')

# take a look
head(df2, 2)
##                                     Site STRatio RLunch
## 1 California School for the Deaf-Fremont    5.07    372
## 2                      Dublin Elementary   26.61    111

Merge the NCES Data to the Schools sf spatial dataframe

schools_sf2 <- merge(schools_sf, df2, by = "Site")

# take a look
head(schools_sf2, 2)
## Simple feature collection with 2 features and 9 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -122.2273 ymin: 37.75293 xmax: -122.1861 ymax: 37.78232
## Geodetic CRS:  WGS 84
##                        Site          Address    City State Type API    Org
## 1           Achieve Academy 1700 28th Avenue Oakland    CA   ES 788 Public
## 2 ACORN Woodland Elementary 1025 81st Avenue Oakland    CA   ES 782 Public
##   STRatio RLunch                   geometry
## 1   20.99    642 POINT (-122.2273 37.78232)
## 2   23.08    276 POINT (-122.1861 37.75293)

Now we can create a map using the Free/Reduced lunch values

tmap_mode('plot')
## tmap mode set to plotting
# Add the county polygon
tm_shape(counties[counties$NAME=='Alameda',]) +
  tm_polygons(col="lightgrey") +
  
tm_shape(schools_sf2) + 
  tm_bubbles(size = "RLunch", 
             col = "pink", 
             border.col = 'black', 
             title.size = "Students Eligible for Free/Reduced Lunch") +
  
  tm_layout(legend.position = c('right', 'top'))

Question

What does this code do?

ttm()
tmap_last()

5.5 Mapping Categorical Data

Mapping categorical data, also called qualitative data, is a bit more straightforward. There is no need to scale or classify data values. The goal of the color map is to provide a contrasting set of colors so as to clearly delineate different categories. Here’s a point-based example:

# Add the county polygon
tm_shape(counties[counties$NAME=='Alameda',]) +
  tm_polygons(col="lightgrey") +
  
tm_shape(schools_sf) + 
  tm_dots(col = 'Org', 
          size = 0.15, 
          palette = 'Spectral', 
          title = "School Type") +
  
tm_layout(legend.position = c('left', 'bottom'))

5.6 Recap

We learned about important data driven mapping strategies and mapping concepts, including:

  • Choropleth Maps
  • Color Palettes
  • Classification Schemes
  • Point maps

Point and polygons are not the only geometry-types that we can use in data-driven mapping! You can also map linear features by associating data values with the color, shape and size of features. But these types of maps are less common.

Exercise: Data-Driven Mapping

Practice creating choropleth and graduated color maps with the counties data. Pick one quantitative variable like MED_AGE and try different color palettes and classification schemes.

Then, try the following:

# Your code here

Solution hidden here!

To see it, right-click and select “inspect element” in your browser (or look in the 04_More_Data_More_Maps.Rmd file near line 427).


 D-Lab @ University of California - Berkeley
 Team Geo